8 research outputs found

    Shape Modeling with Spline Partitions

    Full text link
    Shape modelling (with methods that output shapes) is a new and important task in Bayesian nonparametrics and bioinformatics. In this work, we focus on Bayesian nonparametric methods for capturing shapes by partitioning a space using curves. In related work, the classical Mondrian process is used to partition spaces recursively with axis-aligned cuts, and is widely applied in multi-dimensional and relational data. The Mondrian process outputs hyper-rectangles. Recently, the random tessellation process was introduced as a generalization of the Mondrian process, partitioning a domain with non-axis aligned cuts in an arbitrary dimensional space, and outputting polytopes. Motivated by these processes, in this work, we propose a novel parallelized Bayesian nonparametric approach to partition a domain with curves, enabling complex data-shapes to be acquired. We apply our method to HIV-1-infected human macrophage image dataset, and also simulated datasets sets to illustrate our approach. We compare to support vector machines, random forests and state-of-the-art computer vision methods such as simple linear iterative clustering super pixel image segmentation. We develop an R package that is available at \url{https://github.com/ShufeiGe/Shape-Modeling-with-Spline-Partitions}

    Random Tessellation Forests

    Full text link
    Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.Comment: 11 pages, 4 figure

    Genome-Wide Association with Uncertainty in the Genetic Similarity Matrix

    Get PDF
    Genome-wide association studies (GWASs) are often confounded by population stratification and structure. Linear mixed models (LMMs) are a powerful class of methods for uncovering genetic effects, while controlling for such confounding. LMMs include random effects for a genetic similarity matrix, and they assume that a true genetic similarity matrix is known. However, uncertainty about the phylogenetic structure of a study population may degrade the quality of LMM results. This may happen in bacterial studies in which the number of samples or loci is small, or in studies with low-quality genotyping. In this study, we develop methods for linear mixed models in which the genetic similarity matrix is unknown and is derived from Markov chain Monte Carlo estimates of the phylogeny. We apply our model to a GWAS of multidrug resistance in tuberculosis, and illustrate our methods on simulated data

    Statistical machine learning in computational genetics

    Get PDF
    Statistical machine learning has played a key role in many areas, such as biology, health sciences, finance and genetics. Important tasks in computational genetics include disease prediction, capturing shapes within images, computation of genetic sharing between pairs of individuals, genome-wide association studies and image clustering. This thesis develops several learning methods to address these computational genetics problems. Firstly, motivated by the need for fast computation of genetic sharing among pairs of individuals, we propose the fastest algorithms for computing the kinship coefficient of a set of individuals with a known large pedigree. {Moreover, we consider the possibility that the founders of the known pedigree may themselves be inbred and compute the appropriate inbreeding-adjusted kinship coefficients, which has not been addressed in literature.} Secondly, motivated by an imaging genetics study of the Alzheimer\u27s Disease Neuroimaging Initiative, we develop a Bayesian bivariate spatial group lasso model for multivariate regression analysis applicable to exam the influence of genetic variation on brain structure and accommodate the correlation structures typically seen in structural brain imaging data. We develop a mean-field variational Bayes algorithm and a Gibbs sampling algorithm to fit the model. We also incorporate Bayesian false discovery rate procedures to select SNPs. The new spatial model demonstrates superior performance over a standard model in our application. Thirdly, we propose the Random Tessellation Process (RTP) to model complex genetic data structures to predict disease status. The RTP is a multi-dimensional partitioning tree with non-axis aligned cuts. We develop a sequential Monte Carlo (SMC) algorithm for inference. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. Fourthly, we propose the Random Tessellation with Splines (RTS) to acquire complex shapes within images. The RTS provides a framework for describing Bayesian nonparametric models based on partitioning two-dimensional Euclidean space with splines. We also develop an inference algorithm that is "embarrassingly parallel". Finally, we extend the mixtures of spatial spline regression with mixed-effects model under the Bayesian framework to accommodate streaming image data. We propose an SMC algorithm to analyze online fashion brain image

    Energy and Operation Characteristics of Electric Excavator With Innovative Hydraulic-Electric Dual Power Drive Boom System

    No full text
    In the existing electric excavators, the energy efficiency of the hydraulic system is less than 30% due to a large amount of throttling loss and waste of potential energy. In order to improve excavator energy efficiency, an electric excavator scheme using a hydraulic-electric dual-power drive boom system is proposed. A linear actuator, including electro-mechanical unit and hydraulic unit, was adopted in the boom system. The boom velocity is controlled by the electro-mechanical unit instead of hydraulic valve to reduce throttling loss. The non-rod chamber of the linear actuator is connected to a hydraulic accumulator to reutilize boom gravitational potential energy. In addition, when the boom and other devices are operated together, the throttling loss caused by load difference of multi-actuators can be reduced because the linear actuator can compensate for the pump pressure. The working principle, control strategy, and characteristics of the proposed electric excavator were analyzed through theory and experiments. The results show that the proposed system can reduce throttling loss and efficiently reutilize the boom gravitational potential energy. During boom lifting and lowering process, the reutilization rate of the boom gravitational potential energy is 67.6%, and the energy consumption is reduced by 66.1%. During land levelling process, the throttling loss of the electric excavator is reduced by 49.6% and the energy consumption is reduced by 38.1%. The research results will provide new methods for the electrification of construction machinery
    corecore